1
MP Server / Help needed: reverse-engineer on-disk cache format in MP V9.5
« on: March 11, 2024, 02:08:31 AM »
Background: cache-only setup, database backup failure, cache filesystem backups (with TSM) complete.
Documents stored "as is" with the generic indexer only, so no separation of resources and document data.
Need to recover some documents from AG where arsmaint -c and arsmaint -d have been run where the segment date incorrectly was set to 1970-01-01 on some documents...
I've restored the affected DOC files from backup (into another folder), both the 1136FAA1 (OD77-compressed metadata) and 1136FAAA (OD77-compressed documents).
I've run arsadmin decompress on the restored files, and have noticed that the ...FAA1 files contain the document metadata, and the ...FAAA etc files contain the documents themselves.
This far, I've noticed that the first line (sometimes lines) begin with "<" and end with ">" and in between contain tab-separated AG field names.
Then follows lines with the metadata, one line for each document. First comes the values of the AG field names, in the same order as in the <>-enclosed header, and then some CMOD-specific fields that could look like this:
1136FAAA 0 21945 0 106262 U O 0 1 0
The first is obviously the name of the file containing the documents, and the second and third one (0 + 21945) is the byte offset and length of the document, after decompression, in the document data file.
But, what are the other fields? 0, 106262, U, O, 0, 1, 0 ?
In this example file, the first 29 documents make perfect sense. Here is the CMOD-data for documents 27 - 32 in the decompressed 1136FAA1 file:
1136FAAA 572939 21888 0 106262 U O 0 1 0
1136FAAA 594827 21887 0 106262 U O 0 1 0
1136FAAA 616714 21971 0 106262 U O 0 1 0
1136FAAA 0 21894 106262 104751 U O 0 1 0
1136FAAA 21894 22109 106262 104751 U O 0 1 0
1136FAAA 44003 22005 106262 104751 U O 0 1 0
The 1136FAAA file is exactly 616714 + 21971 byte after decompression of the entire file, so, at the same time the offset counter drops back to zero, and the second pair of "counters" increase, I don't understand where to find these remaining documents.
Does anyone here know what the 0 / 106262 / 104751 in the columns after the first offset+length pairs mean?
Do you think there could be a way to salvage the remaining documents from the cache backups, without using the database?
Thanks!
Matt
Documents stored "as is" with the generic indexer only, so no separation of resources and document data.
Need to recover some documents from AG where arsmaint -c and arsmaint -d have been run where the segment date incorrectly was set to 1970-01-01 on some documents...
I've restored the affected DOC files from backup (into another folder), both the 1136FAA1 (OD77-compressed metadata) and 1136FAAA (OD77-compressed documents).
I've run arsadmin decompress on the restored files, and have noticed that the ...FAA1 files contain the document metadata, and the ...FAAA etc files contain the documents themselves.
This far, I've noticed that the first line (sometimes lines) begin with "<" and end with ">" and in between contain tab-separated AG field names.
Then follows lines with the metadata, one line for each document. First comes the values of the AG field names, in the same order as in the <>-enclosed header, and then some CMOD-specific fields that could look like this:
1136FAAA 0 21945 0 106262 U O 0 1 0
The first is obviously the name of the file containing the documents, and the second and third one (0 + 21945) is the byte offset and length of the document, after decompression, in the document data file.
But, what are the other fields? 0, 106262, U, O, 0, 1, 0 ?
In this example file, the first 29 documents make perfect sense. Here is the CMOD-data for documents 27 - 32 in the decompressed 1136FAA1 file:
1136FAAA 572939 21888 0 106262 U O 0 1 0
1136FAAA 594827 21887 0 106262 U O 0 1 0
1136FAAA 616714 21971 0 106262 U O 0 1 0
1136FAAA 0 21894 106262 104751 U O 0 1 0
1136FAAA 21894 22109 106262 104751 U O 0 1 0
1136FAAA 44003 22005 106262 104751 U O 0 1 0
The 1136FAAA file is exactly 616714 + 21971 byte after decompression of the entire file, so, at the same time the offset counter drops back to zero, and the second pair of "counters" increase, I don't understand where to find these remaining documents.
Does anyone here know what the 0 / 106262 / 104751 in the columns after the first offset+length pairs mean?
Do you think there could be a way to salvage the remaining documents from the cache backups, without using the database?
Thanks!
Matt